Skip to content

Conversation

abey79
Copy link
Member

@abey79 abey79 commented Oct 3, 2025

Related

What

Introduces gRPC endpoints and associated SDK method to access the dataset manifest table, which contains a row per layer. Also, remove most layer-related columns from the partition table.

This PR also attempts to solidify the notion that Scan{PartitionTable|DatasetManifest}Response is the One True Source(tm) of information on the returned dataframe's schema.

The OSS server does not yet implement the dataset manifest (RR-2482).

Copy link

github-actions bot commented Oct 3, 2025

Web viewer failed to build.

Result Commit Link Manifest
ff20d4e https://rerun.io/viewer/pr/11423 +nightly +main

Note: This comment is updated whenever you push a commit.

@abey79 abey79 changed the title Add grpc endpoint for layer table and cleanup helper objects Introduce the layer table and remove layer information from the partition table Oct 3, 2025
@abey79 abey79 added sdk-python Python logging API include in changelog dataplatform Rerun Data Platform integration labels Oct 3, 2025
@abey79 abey79 force-pushed the antoine/layer-table branch from 701aed1 to 1cd8a69 Compare October 6, 2025 14:21
@abey79 abey79 changed the title Introduce the layer table and remove layer information from the partition table Introduce the dataset manifest and remove layer information from the partition table Oct 7, 2025
@abey79 abey79 requested a review from Copilot October 7, 2025 07:30
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the dataset manifest table functionality and removes layer-specific columns from the partition table. The dataset manifest provides layer-level metadata while the partition table now focuses solely on partition-level information.

  • Adds new gRPC endpoints for dataset manifest schema and scanning operations
  • Refactors partition table structure to remove layer information and add partition metadata
  • Implements dataset manifest provider for DataFusion integration

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
rerun_py/src/catalog/dataset_entry.rs Adds manifest() method to expose dataset manifest as DataFusion table
rerun_py/rerun_bindings/rerun_bindings.pyi Python type hints for new manifest() method
crates/store/re_server/src/store.rs Updates partition table schema removing layer columns and adding metadata
crates/store/re_server/src/rerun_cloud.rs Implements placeholder gRPC handlers for dataset manifest endpoints
crates/store/re_redap_client/src/lib.rs Adds error variant for dataset manifest schema operations
crates/store/re_redap_client/src/connection_client.rs Implements client method for dataset manifest schema fetching
crates/store/re_protos/src/v1alpha1/rerun.cloud.v1alpha1.rs Generated protobuf code for new dataset manifest endpoints
crates/store/re_protos/src/v1alpha1/rerun.cloud.v1alpha1.ext.rs Schema definitions and helper methods for dataset manifest responses
crates/store/re_protos/proto/rerun/v1alpha1/cloud.proto Protocol buffer definitions for dataset manifest endpoints
crates/store/re_datafusion/src/partition_table.rs Adds TODO comment for deduplication
crates/store/re_datafusion/src/lib.rs Exports new DatasetManifestProvider
crates/store/re_datafusion/src/dataset_manifest.rs Implements DatasetManifestProvider for DataFusion integration

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataplatform Rerun Data Platform integration include in changelog sdk-python Python logging API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant